Using Validation Sets to Avoid Overfitting in AdaBoost
نویسندگان
چکیده
AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We demonstrate that overfitting in AdaBoost can be alleviated in a time-efficient manner using a combination of dagging and validation sets. Half of the training set is removed to form the validation set. The sequence of base classifiers, produced by AdaBoost from the training set, is applied to the validation set, creating a modified set of weights. The training and validation sets are switched, and a second pass is performed. The final classifier votes using both sets of weights. We show our algorithm has similar performance on standard datasets and improved performance when classification noise is added.
منابع مشابه
Using Validation to Avoid Overfitting in Boosting Using Validation to Avoid Overfitting in Boosting
AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because it focuses on misclassified examples, which may be noisy. We demonstrate that overfitting in AdaBoost can be alleviated in a time-efficient manner using a combination of dagging and validation sets. The training set is partitioned in...
متن کاملThe imprecise Dirichlet model as a basis for a new boosting classification algorithm
A new algorithm for ensemble construction based on adapted restricting a set of weights of examples in training data to avoid overfitting and to reduce a number of iterations is proposed in the paper. The algorithm called IDMBoost (Imprecise Dirichlet Model Boost) applies Walley’s imprecise Dirichlet model for modifying the restricted sets of weights depending on the number and location of clas...
متن کاملRegularizing AdaBoost
AdaBoost is an iterative algorithm to constructclassifier ensembles. It quickly achieves high accuracy by focusingon objects that are difficult to classify. Because of this, AdaBoosttends to overfit when subjected to noisy datasets. We observethat this can be partially prevented with the use of validationsets, taken from the same noisy training set. But using less thanth...
متن کاملA Fast Scheme for Feature Subset Selection to Avoid Overfitting in AdaBoost
AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We show that with the introduction of a scoring function and the random selection of training data it is possible to create a smaller set of feature vectors. The selection of th...
متن کاملAvoiding Boosting Overfitting by Removing Confusing Samples
Boosting methods are known to exhibit noticeable overfitting on some datasets, while being immune to overfitting on other ones. In this paper we show that standard boosting algorithms are not appropriate in case of overlapping classes. This inadequateness is likely to be the major source of boosting overfitting while working with real world data. To verify our conclusion we use the fact that an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006